(Intel/ Look Inside: 

OpenGl® ES 3.0 and Beyond 

How To Deliver Desktop Graphics on Mobile Platforms 


Chris Kirkpatrick, Jon Kennedy 


Legal 

Copyright © 2014 Intel Corporation. All rights reserved. 

*Other names and brands may be claimed as the property of others. 

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, 

BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE 
FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS 

OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, 
MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. 

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE 
INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND 
THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, 
DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR 
NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. 

Intel may make changes to specifications and product descriptions at any time, without notice. 

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. 

Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current 
characterized errata are available on request. 

Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third 
parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole 
risk of the user. 

Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel's current plan of record product 
roadmaps. 

Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as 
SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results 
to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when 
combined with other products. For more information go to 
http://www.Intel.com/performance 

Iris™ graphics is available on select systems. Consult your system manufacturer. 

Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries. 



Why is OpenGl ES 3.0 and Beyond Important? 


OpenGL GS 3.1 specification is released at GDC 2014. 
http://www.khronos.org/registry/gles/ 


OpenGL GS 3.1 is reaching parity with desktop 
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Android is dominant in the market 


Android 

62 % 


Microsoft 

2 % 



Worldwide Tablet Sales to End Users 201 3 
http://www.Qartner.com/newsroonn/id/26742 1 5 

(Feb 2014) 


OpenGl GS 3.0 is gaining market share 



Android OpenGl ES Version Support Distribution 
https://developer.android.com/about/dashboards/index 

(Feb 2014) 



New Features for OpenGL ES 3.0 


^pen GL|ES- 


Main new features 

■ Multiple Render Targets 

■ Occlusion Queries 

■ Instanced rendering 

■ Uniform Buffer Objects (UBO) and Uniform 
Blocks 

■ Transform feedback 

■ Primitive restart 

■ Program Binary 

Enhanced texturing functionality 

■ Swizzles, 3D textures, 2D array textures, 
LOD/MIP level clamps, seamless cube maps, 
immutable textures, NPOT textures, sampler 
objects 


New renderbuffer and texture formats 

■ Floating point formats 

■ Shared exponent RGB formats 

■ ETC/EAC texture compression 

■ Depth and depth/stencil formats 

■ Single and dual channel texture 

- (R and RG ) 

ES Shading Language Version 3.00 

■ Full support for 32 bit integer/floating point 
data types (IEEE754) 

■ In/out storage qualifier 

- value copied to/from subsequent/previous 
pipeline stage 

■ Array constructors and operations 

■ New built-in functions 
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OpenGL ES 3.0 - Multi-Render Targets 


What is it? 


■ Enables writing to multiple framebuffer 
color buffer attachment points with a single 
pass 

Why is it useful? 

■ Techniques requiring multiple passes can 
be condensed into a single pass to save 
redundant execution of the vertex shader 

■ Useful for Deferred Shading and Screen 
Space Ambient Occlusion 
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OpenGl ES 3.0 - Multi-Render Targets 


Enabled by attaching framebuffer-attachable images to GL_COLOR_ATTACHMENTi of a 
created FBO 

■ Support for at least 4 attachment points 

■ Intel supports 8 

■ Maximum specified by GL_MAX_COLOR_ATTACHMENTS 


Most often used in deferred shading i.e. 

■ 1 colour buffer for the surface colours 

■ 1 colour buffer for the surface normals 

■ 1 colour buffer for the depth values 

■ 1 colour buffer for extra lighting information, such as specular or ambient occlusion 



OpenGL ES 3.0 - Multi-Render Targets Sample 





OpenGL ES 3.0 - Occlusion Queries 


What is it? 

■ A hardware method for detecting whether an object is visible 

■ Works by testing if samples pass the depth test 

■ Queries are asynchronous, but blocking call available if required 

Why is it useful? 

■ Remove complex scene geometry by culling large batches of geometry via bounding box 
tests 

■ Best on large scenes with large nearby occluders 



OpenGl ES 3.0 - Occlusion Queries Sample 



GL ES API Code Snippet 


glGenQueries (1, &query) ; 

glBeginQuery (GL_ANY_SAMPLES_PASSED, query) ; 

// Draw some primitives 

glEndQuery (query) ; 

// Check if the result is available 

glGetQueryOb j ectuiv (query, GL_QUERY_RESULT_AVAILABLE , 
if (result == GL_TRUE ) 

{ 



&result) ; 


// This is a blocking call 

glGetQueryOb j ectuiv (query, GL_QUERY_RESULT, &anyPassed) ; 



OpenGl ES 3.0 - Instanced Rendering 


What is it? 

■ Enables rendering multiple geometry 
instances with a single draw call 

■ Instances may be provided with unique 
attributes (transformation, bones, etc.) 

Why is it useful? 

■ Reduces API call overhead when rendering 
duplicate meshes 






OpenGl ES 3.0 - Instanced Rendering 


Non-instanced 


Instanced 


■ BO's store Vertex, Normal, Tex data 


■ BO's store Vertex, Normal, Tex data 


■ Transformations stored as uniform data 
■ Set per-instance with glUniformMatrix* 



for (int i = 0; i < Numlnstances; ++i) { 
glllnif ormMatrix4fv(...) ; 
glDrawElements(...) ; 

} 


■ Transformations stored in a BO 

■ glVertexAttribDivisor handles the creation of 
"instanced attributes" 
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glVertexAttribDivisor(3, 1) ; 
glDrawElementsInstanced(..., Numlnstances) ; 
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OpenGl ES 3.0 - Instanced Rendering Sample 


GL ES API Code Snippet 

// Attrib 0 (vertex information) changes per vertex 
glVertexAttribDivisor (0, 0) 

// Attrib 1 (matrix data) changes per instance 
glVertexAttribDivisor (1,1) 


// When rendering 

glDrawArraysInstanced (Mode, First, Count, 

NumberOf Instances ) ; 

//or 

glDrawElementsInstanced (Mode , Count, IndType, Indicies, 

NumOf Instances ) ; 


GLES Vertex Shader Code Snippet 



// By default attributes have a divisor of zero— advancing per vertex 
// Attributes with a positive divisor will advance every divisor instances 


// The built-in variable gIJnstancelD holds the current instance 
// Default value is zero; safe to reference when not using instanced draw 
// calls 

in vec3 Position; 


// Takes attribute positions 1 ,2,3,4 

in mat4 WorldPosition; 


// Pass the instance id on to the pixel shader 

flat out int InstancelD; 


void main ( ) 


gl_Position = vec4 (WorldPosition + Position, 



InstancelD = gl_InstanceID; 


1 . 0 ) ; 
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OpenGl® ES 3.1 


OpenGl ES 3.1 


KHRONOS. ^^GL|ESu 


Intel announced support for the OpenGl ES 3.1 specification on the Bay Trail 
platform for Android. 

http://bloas. intel. com/technoloav/20 7 4/03/open-Ql-es-Qdc-Z 7 04-s weet-spot-mobile- 
araphics-evolution/ 

"Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process 
when available. Current conformance status can be found at www.khronos.org/conformance . " 


Intel has extended support beyond the core specification to include Geometry 
Shaders, Tesselation and Intel Pixel Sync Technology. 


OpenGL ES 3.1 Specification and header files can be found here : 
http://www.khronos.org/registry/gles/ 
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OpenGl ES 3.1 on Intel's Bay Trail Platform 






















OpenGL ES 3.1 - Compute Shaders 


What are they? 

■ A compute shader is used for general compute on shader defined 
inputs with shader defined outputs. 

■ Run logically independent of the 3D pipeline. 

■ Although well pipelined with 3D primitives. 

■ Run at a user defined frequency. 

■ Similar to OpenCL® Kernels. 

■ Allow better integration into 3D applications. 

■ Can directly access OpenGL ES textures, images and buffer objects. 

■ Can be efficiently pipelined with 3D primitives. 

■ Lightweight. 

Why are they useful? 

■ Compute shaders are frequently used on the desktop for image post- 
processing, deferred rendering, visibility culling, computer vision, 
particle physics, etc... 



Cloth using compute shaders 





OpenGL ES 3.1 - Compute Shaders 


Compute shaders work on: 

■ Workgroups 

■ Each workgroups consists of a number of compute shader 
threads, 

- The user defines the workgroup size and number of workgroups. 
Both parameters are in 3 dimensions. 

■ The workgroup size is fixed at compilation time, 

■ The number of workgroups is specified at dispatch time. 

■ Compute Shader Threads 

- Each thread can share data with other members of the 
workgroup via special shared variables, 

■ Each thread can issue memory and control barriers to 
synchronise with other members of the workgroup, 

■ Data can not be effectively shared between workgroups, unless 
via images, buffer objects or atomic counters, 

■ Each thread can uniquely identify itself within a workgroup and 
globally with builtin variables. This is the only method for a 
thread to determine where to get its input and where to write its 
output. 











OpenGL ES 3.1 - Compute Shaders 


Compute shaders also bring: 

■ Shader Image Load Store 

■ Random read/write access to a single level of a texture 
map 

■ Atomic operations 

■ Shader Storage Buffer Objects 

■ Random read/write access to variables stored within a 
buffer object 

■ Atomic operations 

■ Shader Atomic Counters 

■ Backed by buffer object memory 

■ They allow the proper sequencing of memory accesses 
between workgroups 

These are also available to other shader stages. 


TiL ES API Code Snippet 


glGenTextures (1, StexHandle) ; 
glBindTexture (GL_TEXTURE_2D, texHandle) ; 
glTexImage2D (GL_TEXTURE_2% 0, GL_R32F, 512, $12, 



GL_RED, GL_FLOAT, NULL) ; 


// Bind the texture to an image so it can be written to 

glBindlmageTexture (0, texHandle, 0, GL_FALSE, 0, GL_WRITE_ONLY , GL_R32F) ; 


glUseProgram (computeHandle) ; 

GLuint. loe = glGetUnif ormLocation (computeHandla, 
glUniformlf (loc, frame*0 . 01 f) ; 

// 512 A 2 threads in blocks of 16 A 2 
glOispatchCompute (512/16, 512/1 6, 5jil) ; 



GLSL Compute Shader Code Snippet 


uniform float roll.; 
uniform image 2D destTex; 

layout (local_size_x = 16, local_size_y = 16) ih; // 16x16 threads per workgroup 
void main {) 

{ 

ivec2 storePos = ivec2 (gl_GlobalIiivocationID . xy) ; 

float localCoef = length (vec2 (ivec2 (gl_LocalXnvocationlD . xy) -8) /8 . 0) ; 

float globalCoef = sin (float (gl_WorkGroupID . x+gl_WorkGroupID . y) *0 . 1 + roll)*0.5; 

imageStore (destTex, storePos, vec4 (1 . 0-globalCoef *localCoef , 0.0, 0.0, 0.0)); 
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OpenGL ES 3.1 EXT Extensions -Tessellation Shaders 


What is it? 

• An optional stage in the rendering pipeline that is capable of 
generating additional geometry 

• More efficient than geometry shaders for high levels of 
geometry expansion; tessellation can not be used for culling 
patches. 

• The control shader operates on control points and is responsible for 
specifying tessellation levels, per-control point position and per patch 
varyings for the evaluation shader. 

• The evaluation shader outputs the positions/normal/etc. using 
abstract coordinates from the tesselator 

• Each invocation operates on a single vertex within the 
tessellated patch 

Why do you want it? 

• Reduces memory bandwidth/footprint 

What can you do with it? 

• Progressive LOD, Displacement mapping, Sub-D surfaces. Complex 
hair modelling 



C 


Vertex Shader 


) 






OpenGL ES 3.1 EXT Extensions 


What are they? 

■ A shader which processes the output of the primitive 
assembler (or the tessellation evaluation shader) 

■ Full access to the assembled primitive (points, lines, 
lines with adjacency, triangles, triangles with 
adjacency) 

■ Output new geometry (points, line strips, triangle 
strips)— does not have to match the input stage 

Why are they useful? 

■ Impostors, Wireframe rendering, NPR, Procedural 
Geometry, Shadow Volume Extrusion, Geometry Culling 

■ Layered rendering(with the appropriate extensions)— 
rendering a single primitive to multiple images without 
changing render targets 


Geometry Shaders 



Rasterization 




OpenGL ES 3.1 Intel Extensions - Pixel Sync 


What is it? 

• An Intel OpenGl|ES Extension: 

GL_INTEL_f ragment_shade r_ordering 

• Allows synchronisation to unordered memory accesses from within a 
shader 

• Add a single builtin to your shader at the point of synchronization 

beginFragmentShaderOrderingINTEL () ; 

Why do you want it? 

• Fragments mapping to the same pixel using unordered memory accesses 
can cause data races 

• Fragments can be shaded out-of-order 

What can you do with it? 

• Order independent transparency 

• Programmable blending 

• Adaptive volumetric shadow maps 

• Etc 



With AVSM 


Adaptive Volumetric Shadow Maps (AVSM) 


No AVSM 


http://software.intel.com/en-us/articles/pixel-synchronization-solving-old-graphics-problems-with-new-data-structures 


Codemasters Grid 2 in-game screenshots 




OpenGL ES 3.1 - More Information 


• More demos can be seen at the Intel Booth (#1 01 6) in the South Hall. 

• You can hear more about OpenGL ES 3.1 and its use in real games by visiting 
further Intel talks entitled: 

• "SSX: Bringing a PS3 game to Android" 

-Thursday 10-1 1AM 

• "Adding High-end Graphical Effects to GT Racing 2 on Android x86" 

- Thursday 2:30-3:30 

• "Rendering in Codemasters' GRID2 and beyond: Achieving the ultimate graphics on both 
PC and tablet" 

- Thursday 4-5PM 




Ready for More? Look Inside™ 


Keep in touch with us at GDC and beyond: 

• Game Developer Conference 

Visit our Intel® booth #1 01 6 in Moscone South 

• Intel University Games Showcase 
Marriott Marquis Salon 7, Thursday 5:30pm 
RSVP at bit.ly/intelgame 

• Intel Developer Forum, San Francisco 
September 9-11, 2014 

intel.com/idf1 4 

• Intel Software Adrenaline 
@inteladrenaline 

• Intel Developer Zone 
software.intel.com 
@intelsoftware 
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Up Next... 


3:30 - 4:30 

Multi-player, multi-touch game development: Developing games 
for the fastest growing segment in desktop! 

Presented by: 

Alex Guo - Symbio Games & Faisal Habib - Intel 



